A Crash Test with Linguistica in Modern Greek: The Case of Derivational Affixes and Bound Stems

نویسندگان

  • Athanasios Karasimos
  • Evanthia Petropoulou
چکیده

This paper attempts to participate in the ongoing discussion in search of a suitable model for the computational treatment of Greek morphology. Focusing on the unsupervised morphology learning technique, and particularly on the model of Linguistica by Goldsmith (2001), we attempt a computational treatment of specific word formation phenomena in Modern Greek (MG), such as suffixation and compounding with bound stems, through the use of various corpora. The inability of the system to accept any morphological rule as input, hence the term 'unsupervised', interferes to a great extent with its efficiency in parsing, especially in languages with rich morphology, such as MG, among others. Specifically, neither the rich allomorphy, nor the complex combinability of morphemes in MG appear to be treated efficiently through this technique, resulting in low scores of proper word segmentation (22% in inflectional suffixes and 13% in derivational ones), as well as the recognition of false morphemes. 1. Unsupervised Morphology Learning: A theoretical approach 1.1. An Introduction to Unsupervised Morphology Learning As opposed to the computational analyses on syntax, computational work on morphology has been relatively scarce. According to Roark and Sproat (2007), the absence of a corpus of morphologically annotated words put a burden on the development of a machine learning morphological system that could rival a morphologically complex analyzer such as the one proposed by Koskenniemi (1983). However, close to the dawn of the new millennium, the interest in statistical models of morphology, particularly of unsupervised (or lightly supervised) morphology–learning from annotated corpora, has rapidly increased. Special attention has been paid to automatic – basically unsupervised – methods for the discovery of morphological alternations. However, allomorphy poses a serious problem for both tasks. By treating allomorphy, the goal is to find related morphological forms of the same word, such as κύμα and κύματα (kima~ kimat(a)) (‘wave’), which are not the product of any phonological and morphological rules. Since most of recent research has been carried out within the field of unsupervised morphological learning, we will focus our discussion and criticism on this system, and specifically on the theory of Minimum Length Description (MLD) proposed by Goldsmith (2001) [other recent works in the same direction are Yarowsky and Wicentowski 2001, Schone and Jurafsky 2001, Creutz and Lagus 2002]. Goldsmith’s (2001) theory and the implementation of his program Linguistica are based on the framework of Rissanen’s (1989) MLD. His article is not the first work on unsupervised morphology learning, as there are three other approaches by previous researchers. Nevertheless, this work is certainly the mostly cited, and is considered to be the standard model compared to other systems. 1.2. Goldsmith’s Minimum Length Description (2001) Goldsmith’s system starts with a very large corpus of annotated texts and produces a range of signatures along with words that belong to these signatures. A Signature is a set of affixes (prefixes or suffixes) that combine with a given set of stems (Goldsmith, 2001; Roark and Sproat, 2007). An example suffix signature in English could be NULL.ed.ing.s, which combines with the stems jump, laugh, walk, talk, etc., all of which take the signature’s suffixes in order to create words, such as jumpø, jumped, jumping and jumps. Other examples of signatures are e.ed.ing, NULL.s, NULL.ing.s, NULL.er.est.ly, etc. A closer look at the signatures reveals that the sets are not always complete. Usually the past tense suffixes are absent, even for regular verb stems. For example, Roark and Sproat (2007:120) point out that the signature NULL.er.ing.s proposed by Goldsmith (2001: 179), that includes stems such as blow, broadcast, drink, feel does not display the –ed suffix, since the verbs are irregular in their past tense form. However, the –ed suffix is also absent from stems such as bomb and farm, which, although regular in their past tense form (bombed and farmed), unfortunately did not occur in the corpus! Goldsmith discusses in general terms some problems with signatures and notes that his system is incapable of handling alternations (e.g. allomorphs), such as feel/ felt, since it deals only with affixation. 1.3. MLD Model Critism As it will be demonstrated in the next section, this kind of allomorphic alternation can be an enormous problem, if one tries to apply an Unsupervised Morphology Learning Model (UMLM) for example to the Greek language, which exhibits a high degree of complex allomorphy in every word formation process (inflection,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bound Morpheme Frequencies in the Performance of Iranian English Language Undergraduates and English Language Materials Developers in Written Descriptive Tasks

This mini-corpus, cross-linguistic, comparative, and norm-referenced study intends to render the most frequently and oft-used affixes in the written descriptive tasks in the performance of English language materials developers (ELMDs) and Iranian English language undergraduates (IELUs). Samples of writings of both groups were studied and analyzed through affixation principles. The frequency of ...

متن کامل

A Description of Derivational Affixes in Sarhaddi Balochi of Granchin

Sarhaddi Balochi dialect, a language variety of Western (Rakhshani) Balochi, employs derivation through affixation as one of its word formation processes. The purpose of this article is to present a synchronic description of the way(s) different derivational affixes function in making complex words in Sarhaddi Balochi as spoken in Granchin[1] district located about 35Kms to the southeast of Kha...

متن کامل

Rule-based versus associative processes in derivational morphology.

The present article examines whether derivational morphology shows evidence of an associative memory structure. A distributional analysis of stems of attested derivational forms revealed evidence of clustering around phonological properties (gangs) for all nonneutral affixes but only a few neutral affixes. Subjects' acceptability ratings for novel complex words revealed sensitivity to the gang ...

متن کامل

Iranian EFL Learners' Processing of English Derived Words

An interesting area of psycholinguistic inquiry is to discover the way morphological structures are stored in the human mind and how they are retrieved during comprehension or production of language. The current study probed into what goes on in the mind of EFL learners when processing derivational morphology and how English and Persian derivational suffixes are processed. 60 Iranian EFL learne...

متن کامل

A Morphological Processor For Modern Greek

In this paper, we present a morphological processor for Modern Greek. From the linguistic point of view, we tr 5, to elucidate the complexity of the inflectional system using a lexical model which follows the mecent work by Lieber, 1980, Selkirk 1982, Kiparsky 1982, and others. The implementation is based on the concept of "validation grammars" (Coumtin 1977). The morphological processing is co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010